{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Tce3stUlHN0L"
      },
      "source": [
        "##### Copyright 2021 The TensorFlow Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "cellView": "form",
        "id": "tuOe1ymfHZPu"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qFdPvlXBOdUN"
      },
      "source": [
        "# Recommending movies: ranking"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MfBg1C5NB3X0"
      },
      "source": [
        "<table class=\"tfa-notebook-buttons\" align=\"left\">\n",
        "  <td>\n",
        "    <a target=\"_blank\" href=\"https://github.com/tensorflow/recommenders-addons/blob/master/docs/tutorials/dynamic_embedding_tutorial.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on GitHub</a>\n",
        "  </td>\n",
        "</table>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xHxb-dlhMIzW"
      },
      "source": [
        "## Overview\n",
        "In this tutorial, we're going to use the rating data to predict the user's rating of other movies. To achieve this goal, we will follow the following steps:\n",
        "\n",
        "1. Get our data and do some preprocessing to get the required format.\n",
        "2. Implement a neural collaborative filtering(NeuralCF) model.\n",
        "3. Train the model.\n",
        "\n",
        "Different from the general recommendation model, the model we implemented replaces `tf.nn.embedding_lookup` with `tfra.dynamic_embedding.embedding_lookup`, which can handle super large sparse features."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MUXex9ctTuDB"
      },
      "source": [
        "## Imports\n",
        "Let's first get our imports out of the way."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "rEk-ibQkDNtF"
      },
      "outputs": [],
      "source": [
        "!pip install -q --upgrade tensorflow-recommenders-addons\n",
        "!pip install -q --upgrade tensorflow-datasets"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "IqR2PQG4ZaZ0"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "from tensorflow.keras.layers import Dense"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ebc5d71b0b88"
      },
      "outputs": [],
      "source": [
        "import tensorflow_datasets as tfds\n",
        "import tensorflow_recommenders_addons as tfra"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_SixWryS704g"
      },
      "source": [
        "## Preparing the dataset\n",
        "\n",
        "This tutorial uses movies reviews provided by the MovieLens 100K dataset, a classic dataset from the GroupLens research group at the University of Minnesota. In order to facilitate processing, we convert the data type of `movie_id` and `user_id` into `int64`.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "e51a15c085f5"
      },
      "outputs": [],
      "source": [
        "ratings = tfds.load(\"movielens/100k-ratings\", split=\"train\")\n",
        "\n",
        "ratings = ratings.map(lambda x: {\n",
        "    \"movie_id\": tf.strings.to_number(x[\"movie_id\"], tf.int64),\n",
        "    \"user_id\": tf.strings.to_number(x[\"user_id\"], tf.int64),\n",
        "    \"user_rating\": x[\"user_rating\"]\n",
        "})\n",
        "\n",
        "tf.random.set_seed(2021)\n",
        "shuffled = ratings.shuffle(100_000, seed=2021, reshuffle_each_iteration=False)\n",
        "\n",
        "dataset_train = shuffled.take(100_000).batch(256)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a79bfae6c084"
      },
      "source": [
        "## Implementing a model\n",
        "The NCFModel we implemented is very similar to the conventional one, and the main difference lies in the embedding layer. We specify the variable of embedding layer by `tfra.dynamic_embedding.get_variable`. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "a8533a80ef52"
      },
      "outputs": [],
      "source": [
        "class NCFModel(tf.keras.Model):\n",
        "\n",
        "    def __init__(self):\n",
        "        super(NCFModel, self).__init__()\n",
        "        self.embedding_size = 32\n",
        "        self.d0 = Dense(\n",
        "            256,\n",
        "            activation='relu',\n",
        "            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),\n",
        "            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))\n",
        "        self.d1 = Dense(\n",
        "            64,\n",
        "            activation='relu',\n",
        "            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),\n",
        "            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))\n",
        "        self.d2 = Dense(\n",
        "            1,\n",
        "            kernel_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1),\n",
        "            bias_initializer=tf.keras.initializers.RandomNormal(0.0, 0.1))\n",
        "        self.user_embeddings = tfra.dynamic_embedding.get_variable(\n",
        "            name=\"user_dynamic_embeddings\",\n",
        "            dim=self.embedding_size,\n",
        "            initializer=tf.keras.initializers.RandomNormal(-1, 1))\n",
        "        self.movie_embeddings = tfra.dynamic_embedding.get_variable(\n",
        "            name=\"moive_dynamic_embeddings\",\n",
        "            dim=self.embedding_size,\n",
        "            initializer=tf.keras.initializers.RandomNormal(-1, 1))\n",
        "        self.loss = tf.keras.losses.MeanSquaredError()\n",
        "\n",
        "    def call(self, batch):\n",
        "        movie_id = batch[\"movie_id\"]\n",
        "        user_id = batch[\"user_id\"]\n",
        "        rating = batch[\"user_rating\"]\n",
        "\n",
        "        user_id_val, user_id_idx = np.unique(user_id, return_inverse=True)\n",
        "        user_id_weights, user_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup(\n",
        "            params=self.user_embeddings,\n",
        "            ids=user_id_val,\n",
        "            name=\"user-id-weights\",\n",
        "            return_trainable=True)\n",
        "        user_id_weights = tf.gather(user_id_weights, user_id_idx)\n",
        "\n",
        "        movie_id_val, movie_id_idx = np.unique(movie_id, return_inverse=True)\n",
        "        movie_id_weights, movie_id_trainable_wrapper = tfra.dynamic_embedding.embedding_lookup(\n",
        "            params=self.movie_embeddings,\n",
        "            ids=movie_id_val,\n",
        "            name=\"movie-id-weights\",\n",
        "            return_trainable=True)\n",
        "        movie_id_weights = tf.gather(movie_id_weights, movie_id_idx)\n",
        "\n",
        "        embeddings = tf.concat([user_id_weights, movie_id_weights], axis=1)\n",
        "        dnn = self.d0(embeddings)\n",
        "        dnn = self.d1(dnn)\n",
        "        dnn = self.d2(dnn)\n",
        "        out = tf.reshape(dnn, shape=[-1])\n",
        "        loss = self.loss(rating, out)\n",
        "        return loss, [user_id_trainable_wrapper, movie_id_trainable_wrapper]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7e08009d75aa"
      },
      "source": [
        "Let's instantiate the model, and wrap the optimizer in tfra.dynamic_embedding.DynamicEmbeddingOptimizer."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "db8405903104"
      },
      "outputs": [],
      "source": [
        "model = NCFModel()\n",
        "optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)\n",
        "optimizer = tfra.dynamic_embedding.DynamicEmbeddingOptimizer(optimizer)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "defd4c72ed09"
      },
      "source": [
        "## Training the model\n",
        "After defining the model, we can train the model and observe the change of loss."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "9033aa17bb63"
      },
      "outputs": [],
      "source": [
        "def train(epoch=1):\n",
        "    for i in range(epoch):\n",
        "        total_loss = np.array([])\n",
        "        for (_, batch) in enumerate(dataset_train):\n",
        "            with tf.GradientTape() as tape:\n",
        "                loss, trainable_wrapper_list = model(batch)\n",
        "                total_loss = np.append(total_loss, loss)\n",
        "            grads = tape.gradient(loss, model.trainable_variables + trainable_wrapper_list)\n",
        "            optimizer.apply_gradients(zip(grads, model.trainable_variables + trainable_wrapper_list))\n",
        "        print(\"epoch:\", i, \"mean_squared_error:\", np.mean(total_loss))\n",
        "\n",
        "if __name__==\"__main__\":\n",
        "    train(10)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fa0429125ec8"
      },
      "source": [
        "As the model trains, the loss is falling. Through the entire model definition and training process, we can find that the interface between tfra and TensorFlow maintains a good consistency. We can use tfra to build a recommendation model easily by relying on the experience of using TensorFlow, and can effectively handle super large sparse features."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "dynamic_embedding_tutorial.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
